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Epochal dynamics, in which long periods of stasis in an evolving population are punctuated by a 
sudden burst of change, is a common behavior in both natural and artificial evolutionary processes. 
We analyze the population dynamics for a class of fitness functions that exhibit epochal behavior 
using a mathematical framework developed recently. In the latter the approximations employed 
led to a population-size independent theory that allowed us to determine optimal mutation rates. 
Here we extend this approach to include the destabilization of epochs due to finite-population 
fluctuations and show that this dynamical behavior often occurs around the optimal parameter 
settings for efficient search. The resulting, more accurate theory predicts the total number of fitness 
function evaluations to reach the global optimum as a function of mutation rate, population size, 
and the parameters specifying the fitness function. We further identify a generalized error threshold, 
smoothly bounding the two-dimensional regime of mutation rates and population sizes for which 
evolutionary search operates efficiently. 
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Evolutionary search algorithms are a class of stochas- 
tic optimization procedures inspired by biological evolu- 
tion, e.g see Refs. P, |hl||L3| , [2Cfl : a population of candidate 
solutions evolves under selection and random "genetic" 
diversification operators. Evolutionary search algorithms 
have been successfully applied to a diverse variety of opti- 
mization problems; see, for example Refs. p|-|^j9|,pl]| and 
references therein. Unfortunately, and in spite of a fair 
amount of theoretical investigation, the mechanisms con- 
straining and driving the dynamics of evolutionary search 
on a given problem are often not well understood. 

There are very natural difficulties that are responsible 
for this situation. In mathematical terms evolutionary 
search algorithms are population-based discrete stochas- 
tic nonlinear dynamical systems. In general, the con- 
stituents of the search problem, such as the structure of 
the fitness function, selection, finite-population fluctua- 
tions, and genetic operators, interact in complicated ways 
to produce a rich variety of dynamical behaviors that 
cannot be easily understood in terms of the constituents 
individually. These complications make a strictly empir- 
ical approach to the question of whether and how to use 
evolutionary search problematic. 

The wide range of behaviors exhibited by nonlinear 
population-based dynamical systems have been appreci- 
ated for decades in the field of mathematical population 
genetics. Unfortunately, this appreciation has not led to 
a quantitative predictive theory that is applicable to the 
problems of evolutionary search; something desired, if 
not required, for the engineering use of stochastic search 
methods. 



1 



We believe that a general, predictive theory of the dy- 
namics of evolutionary search can be built incrementally, 
starting with a quantitative analytical understanding of 
specific problems and then generalizing to more complex 
situations. In this vein, the work presented here contin- 
ues an attempt to unify and extend theoretical work in 
the areas of evolutionary search theory, molecular evolu- 
tion theory, and mathematical population genetics. Our 
strategy is to focus on a simple class of problems that, 
nonetheless, exhibit some of the rich behaviors encoun- 
tered in the dynamics of evolutionary search algorithms. 
Using analytical tools from statistical mechanics, dynam- 
ical systems theory, and the above mentioned fields we 
developed a detailed and quantitative understanding of 
the search dynamics for a class of problems that exhibit 
epochal evolution. On the one hand, this allows us to 
analytically predict optimal parameter settings for this 
class of problems. On the other hand, the detailed un- 
derstanding of the behavior for this class of problems 
provides valuable insights into the emergent mechanisms 
that control the dynamics in more general settings of evo- 
lutionary search and in other population-based dynami- 
cal systems. 

In a previous paper, Ref. JhJ, we showed how a de- 
tailed dynamical understanding, announced in Ref. |3l| l 
and expanded in Ref. p2fl , can be turned to practical 
advantage. Specifically, we determined how to set the 
mutation rate to reach, in the fewest number of fitness 
function evaluations, the global optimum in a wide class 
of fitness functions. Due to certain cancellations at the 
level of approximation used, the resulting theory lead to 
population-size independent predictions. 

Here we extend this theory to include additional impor- 
tant effects, such as the increased search effort caused by 
the dynamical destabilization of epochs, to be explained 
below, which reintroduce the dependence on population 
size. The result is a more accurate theory that analyt- 
ically predicts the total number of fitness function eval- 
uations needed on average for the algorithm to discover 
the global optimum of the fitness function. 

In addition, we develop a detailed understanding of the 
operating regime in parameter space for which the search 
is performed most efficiently We believe this will provide 
useful guidance on how to set search algorithm parame- 
ters for more complex problems. In particular, our theory 
explains the marginally stable behavior of the dynamics 
when the parameters are set to minimize search effort. 
Most simply put, the optimal parameter setting occurs 
when the dynamics is as stochastic as possible without 
corrupting information stored in the population about 
the location of the current best genotypes. The results 
raise the general question of whether it is desirable for 
optimal search to run in dynamical regimes that are a 
balance of stability and instability. The mechanisms we 
identify suggest how this balance is, in fact, useful. 



II. ROYAL STAIRCASE FITNESS FUNCTIONS 

Choosing a class of fitness functions to analyze is a 
delicate compromise between generality, mathematical 
tractability, and the degree to which the class is rep- 
resentative of problems often encountered in evolution- 
ary search. A detailed knowledge of the fitness function 
is very atypical of evolutionary search problems. If one 
knew the fitness function in detail, one would not have to 
run an evolutionary search algorithm to find high fitness 
solutions in the first place. The other extreme of assum- 
ing complete generality, however, cannot lead to enlight- 
ening results either, since averaged over all problems, all 
optimization algorithms perform equally well (or badly) ; 
see Ref. Q . We thus focus on a specific subset of fitness 
functions, somewhere between these extremes, that we 
believe at least have ingredients typically encountered in 
evolutionary search problems and that exhibit widely ob- 
served dynamical behaviors in both natural and artificial 
evolutionary processes. 

In our preceding paper, Ref. p0| , we justified in some 
detail our particular choice of fitness function both in 
terms of biological motivations and in terms of optimiza- 
tion engineering issues. In short, many biological sys- 
tems and optimization problems have highly degenerate 
genotype-to-phenotype maps; that is, the mapping from 
genetic specification to fitness is a many-to-one function. 
Consequently, the number of different fitness values that 
genotypes can take is much smaller than the number of 
different genotypes. 

Additionally, due to the many-to-one mapping and 
since genotype spaces are generally of very high dimen- 
sionality, the genotype space tends to break into net- 
works of "connected" sets of equal-fitness genotypes that 
can reach each other via elementary genetic variation 
steps such as point mutations. These connected subsets 
of isofitness genotypes are generally referred to as "neu- 
tral networks" in molecular evolution theory, see Refs. 
]l0|,y|Jl5U2|,||. This leads us to posit that the geno- 
type space for general search problems decomposes into 
a number of such neutral networks. We also assume that 
higher fitness networks are smaller in volume than low 
fitness networks. Finally, we assume that from any neu- 
tral network there exist connections to higher fitness net- 
works such that, taken as a whole, the fitness landscape 
has no local optima other than the global optimum. 

Under these assumptions, genotype space takes on a 
particular type of architecture: "subbasins" of the neu- 
tral networks are connected by "portals" leading between 
them and so to higher or lower fitness. Stated in the sim- 
plest terms possible, the evolutionary population dynam- 
ics then becomes a type of diffusion constrained by this 
architecture. For example, individuals in a population 
diffuse over neutral networks until a portal to a network 
of higher fitness is discovered and the population moves 
onto this network. 

In order to model the behavior associated with the 
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subbasin-portal architecture, we defined the class of 
Royal Staircase fitness functions that capture the essen- 
tial elements sketched above. Importantly, this class of 
fitness functions is simple enough to admit a fairly de- 
tailed quantitative mathematical analysis of the associ- 
ated epochal evolutionary dynamics. 

The Royal Staircase fitness functions are defined as 
follows. 

1. Genomes are specified by binary strings s = 
S1S2 ■ ■ ■ SL,Si G {0, 1}, of length L — NK. 

2. Reading the genome from left to right, the number 
I(s) of consecutive Is is counted. 

3. The fitness f(s) of string s with I(s) consecutive 
ones, followed by a zero, is f(s) = 1 + [I(s)/K\. 
The fitness is thus an integer between 1 and N + 1 . 

Four observations are in order. 

1. The fitness function has two parameters, the num- 
ber N of blocks and the number K of bits per block. 
Fixing them determines a particular optimization 
problem or fitness "landscape" . 

2. There is a single global optimum: the genome 
,s = 1 L — namely, the string of all Is — with fitness 
f(s)=N + l. 

3. The proportion p n of genotype space filled by 
strings of fitness n is given by: 

p n=2 - K ^(l-2- K ), (1) 

for n < N. Thus, high fitness strings are exponen- 
tially more rare than low fitness strings. 

4. For each block of K bits, the all-Is pattern is the 
one that confers increased fitness on a string. With- 
out loss of generality, any of the other 2^ — 1 con- 
figurations could have been chosen as the "correct" 
configuration, including different patterns for each 
of the N blocks. Furthermore, since the GA here 
does not use crossover, arbitrary permutations of 
the L bits in the fitness function definition leave 
the evolutionary dynamics unchanged. 

The net result is that the Royal Staircase fitness func- 
tions implement the intuitive idea that increasing fitness 
is obtained by setting more and more bits in the genome 
"correctly". One can only set correct bit values in sets 
of K bits at a time, creating an "aligned" block, and 
in blocks from left to right. A genome's fitness is pro- 
portional to the number of such aligned blocks. And 
since the (n + l)st block only confers fitness when all n 
previous blocks are aligned as well, there is contingency 
between blocks. This realizes our view of the underlying 
architecture as a set of isofitness genomes that occur in 
nested neutral networks of smaller and smaller volume. 
(Cf. Figs. 1 and 2 of Rcf. @.) 



III. THE GENETIC ALGORITHM 

For our analysis of evolutionary search we have chosen 
a simplified form of a genetic algorithm (GA) that does 
not include crossover and that uses fitness-proportionate 
selection. The GA is defined by the following steps. 

1. Generate a population of M bit strings of length 
L = NK with uniform probability over the space 
of .L-bit strings. 

2. Evaluate the fitness of all strings in the population. 

3. Stop, noting the generation number i op t, if a string 
with optimal fitness iV+1 occurs in the population. 
Else, proceed. 

4. Create a new population of M strings by select- 
ing, with replacement and in proportion to fitness, 
strings from the current population. 

5. Mutate, i.e. change, each bit in each string of the 
new population with probability q. 

6. Go to step 2. 

When the algorithm terminates there have been E = 
Mt op t fitness function evaluations. 

In Ref. ]30[| we motivated our excluding crossover and 
discussed at some length the reasons that crossover's role 
in epochal evolution is not expected to be significant due 
to population convergence effects. 

This GA effectively has two parameters: the mutation 
rate q and the population size M. A given optimization 
problem is specified by the fitness function in terms of N 
and K. Stated most prosaically, then, the central goal of 
the following analysis is to find those settings of M and 
q that minimize the average number (E) of fitness func- 
tion queries for given N and K required to discover the 
global optimum. Our approach is to develop analytical 
expressions for £ as a function of N, K , M, and q and 
then to study the search-effort surface E(q, M) at fixed 
N and K. Before beginning the analysis, however, it is 
helpful to develop an appreciation of the basic dynami- 
cal phenomenology of evolutionary search on this class of 
fitness functions. Then we will be in a position to lay out 
the evolutionary equations of motion and analyze them. 



IV. OBSERVED POPULATION DYNAMICS 

The typical behavior of a population evolving on a fit- 
ness "landscape" of connected neutral networks, such as 
defined above, alternates between long periods (epochs) 
of stasis in the population's average fitness and sudden 
increases (innovations) in the average fitness. (See, for 
example, Fig. 1 of Ref. @ and Fig. 1 of Ref. Q.) 
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We now briefly recount the experimentally observed 
behavior of typical Royal Staircase GA runs in which the 
parameters q and M are set close to their optimal setting. 



The reader is referred to Ref. |32) for a detailed discus- 
sion of the dynamical regimes this type of GA exhibits 
over a range of different parameter settings. 
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FIG. 1. Examples of the Royal Staircase GA population dynamics with different parameter settings. The four plots show 
best fitness in the population (upper lines) and average fitness in the population (lower lines) as a function of time, measured 
in generations. The fitness function and GA parameters are given in each plot. In each case we have chosen q and M in the 
neighborhood of their optimal settings (see later) for each of the four values of N and K. 



Figure |l| illustrates the GA's behavior at four different 
parameter settings. Each individual figure plots the best 
fitness in the population (upper lines) and the average 
fitness (/) in the population (lower lines) as a function of 
the number of generations. Each plot is produced from a 
single GA run. In all of these runs the average fitness (/) 
in the population goes through stepwise changes early in 
the run, alternating epochs of stasis with sudden innova- 
tions in fitness. Later in each run, especially for those in 
Figs. [|(b) and 0(d), (/) tends to have higher fluctuations 
and the epochal nature of the dynamics becomes unclear. 

In the GA runs the population starts out with strings 
that only have relatively low fitness, say fitness n (in all 
four plots of Fig. |] we have n = 1). Selection and mu- 
tation then establish an equilibrium in the population 
until a string aligns the nth block, and descendants of 



this string with fitness n + 1 spread through the popula- 
tion. A new equilibrium is then established until a string 
of fitness n + 2 is discovered and so on, until finally, a 
string of fitness N + 1 is discovered. 

Notice that (/) roughly tracks the epochal behavior of 
the best fitness in the population. Every time a newly 
discovered higher fitness string has spread through the 
population, (/) reaches a new, higher equilibrium value 
around which it fluctuates. As a run progresses to higher 
epochs, (/) tends to have higher fluctuations and the 
epochal nature of the dynamics is obscured. This is a re- 
sult of the fact that for the highest epochs the difference 
between (/) in consecutive epochs is smaller than the av- 
erage fitness fluctuations induced by the finite-population 
sampling; see Ref. [ ]32| . 

Notice, too, that often the best fitness shows a series of 
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brief jumps to higher fitness during an epoch. When this 
occurs strings of higher fitness are discovered but, rather 
than spreading through the population, are lost within a 
few generations. 

For each of the four settings of N and K we have cho- 
sen the values of q and M such that the average total 
number (E) of fitness function evaluations to reach the 
global optimum for the first time is minimal. Thus, the 
four plots illustrate the GA's typical dynamics close to 
optimal (q, M)-parameter settings. 

Despite what appears at first blush to be relatively 
small variations in fitness function and GA parameters, 
there is a large range, almost a factor of 10, in times to 
reach the global optimum across the runs. Thus, there 
can be a strong parameter dependence in search times. It 
also turns out that the standard deviation er of the mean 
total number (E) of fitness function evaluations is of the 
same order as (E). (See Table |.) Thus, there are large 
run-to-run variations in the time to reach the global op- 
timum. This is true for all parameter settings with which 
we experimented, of which only a few are reported here. 

Having addressed the commonalities between runs, we 
now turn to additional features that each illustrates. Fig- 
ure ^](a) shows the results of a GA run with TV = 8 blocks 
of K = 8 bits each, a mutation rate of q = 0.005, and a 
population size of M — 200. During the epochs, the best 
fitness in the population hops up and down several times 
before it finally jumps up and the new more-fit strings 
stabilize in the population. This transition is reflected in 
the average fitness also starting to move upward. In this 
particular run, it took the GA approximately 3.4 x 10 5 fit- 
ness function evaluations (1700 generations) to discover 
the global optimum for the first time. Over 500 runs, the 
GA takes on average 5.3 x 10 5 fitness function evalua- 
tions to reach the global optimum for these parameters. 
The inherent large per-run variation means in this case 
that some runs take less than 10 5 function evaluations 
and that others take many more than 10 6 . 

Figure |l|(b) plots a run with N = 6 blocks of length 
K — 6 bits, a mutation rate of q = 0.016, and a popula- 
tion size of M = 150. The GA discovered the global opti- 
mum after approximately 4.8 x 10 4 fitness function eval- 
uations (325 generations). For these parameters, the GA 
uses approximately 5.5 x 10 4 fitness function evaluations 
on average to reach the global fitness optimum. Notice 
that the global optimum is only consistently present in 
the population between generations 530 generation 570. 
After that, the global optimum is lost again until after 
generation 800. As we will show, this is a typical fea- 
ture of the GA's behavior for parameter settings close to 
those that give minimal (E) . The global fitness optimum 
often only occurs in relatively short bursts after which it 
is lost again from the population. Notice also that there 
is only a small difference in (/) depending whether the 
best fitness is either 6 or 7 (the optimum). 

Figure |l|(c) shows a run for a small number (N = 4) 
of large (K = 10) blocks. The mutation rate is q = 0.014 
and the population size is again M = 150. As in the three 



other runs we see that (/) goes through epochs punctu- 
ated by rapid increases in (/). We also see that the best 
fitness in the population jumps several times before the 
population fixes on a higher fitness string. The GA takes 
about 1.9 x 10 5 fitness function evaluations on average to 
discover the global optimum for these parameter settings. 
In this run, the GA first discovered the global optimum 
after 2.7 x 10 5 fitness function evaluations. Notice that 
the optimum never stabilized in the population. 

Finally, Fig. |l|(d) shows a run with a large number 
(N = 10) of small (K — 5) blocks. The mutation rate is 
q = 0.008 and the population size is M = 100. Notice 
that in this run, the best fitness in the population alter- 
nates several times between fitnesses 8, 9, and 10 before 
it reaches (fleetingly) the global fitness optimum of 11. 
Quickly after it has discovered the global optimum, it 
disappears again and the best fitness in the population 
largely alternates between 9 and 10 from then on. It is 
notable that this intermittent behavior of the best fitness 
is barely discernible in the behavior of (/). It appears to 
be lost in the "noise" of the average fitness fluctuations. 
The GA takes about 1.2 x 10 5 fitness function evalua- 
tions on average at these parameter settings to reach the 
global optimum; while in this particular run the GA took 
1.6 x 10 5 fitness function evaluations (1640 generations) 
to briefly reach the optimum for the first time. 

V. STATISTICAL DYNAMICS OF 
EVOLUTIONARY SEARCH 

In Refs. |n| and [[32| we developed the statistical dy- 
namics of genetic algorithms to analyze the behavioral 
regimes of a GA searching the Royal Road fitness func- 
tions, which are closely related to the Royal Staircase 
fitness functions just defined. The analysis here builds 
on those results and, additionally, is a direct extension 
of the optimization analysis and calculations in Ref . |30| . 
We briefly review the essential points from these previ- 
ous papers. We refer the reader to Ref. |5^] for a detailed 
description of the similarities and differences of our the- 
oretical approach with other theoretical approaches such 
as the work by Prugel-Bennett, Rattray, and Shapiro, 
Refs. |p5|-p7|, the diffusion equation methods developed 
by Kimura, Refs. |l^,|l^], and the quasispecies theory, 
Ref. @. 
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(E) 


o 


8 


8 


200 


0.005 


5.3 x 10 5 


2.1 x 10 5 


6 


6 


150 


0.016 


5.5 x 10 4 


3.0 x 10 4 


4 


10 


150 


0.014 


1.9 x 10 5 


1.0 x 10 5 


10 


5 


100 


0.008 


1.2 x 10 b 


4.9 x 10 4 



TABLE I. Mean {E) and standard deviations a of the ex- 
pected number of fitness function evaluations for the Royal 
Staircase fitness functions and GA parameters shown in the 
runs of Fig. hi. The estimates were made from 500 GA runs. 
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A. Macrostate Space 

Formally, the state of a population in an evolutionary 
search algorithm is only specified when the frequency of 
occurrence of each of the 2 L genotypes is given. Thus, the 
dimension of the corresponding microscopic state space is 
very large. One immediate consequence is that the evo- 
lutionary dynamic, on this level, is given by a stochastic 
(Markovian) operator of size 0(2 L x 2 L ). Generally, us- 
ing such a microscopic description makes analytical and 
quantitative predictions of the GA's behavior unwieldy. 
Moreover, since the practitioner is generally interested in 
the dynamics of some more macroscopic statistics, such 
as best and average fitness, a microscopic description is 
uninformative unless an appropriate projection onto the 
desired macroscopic statistic is found. 

With these difficulties in mind, we choose to describe 
the macroscopic state of the population by its fitness dis- 
tribution, denoted by a vector P = (Pi, P 2 , . . . , Pjy+i), 
where the components < Pf < 1 are the propor- 
tions of individuals in the population with fitness / 

1.2 Y ■ 1 . We refer to P as the phenotypic quasis- 

pecies, following its analog in molecular evolution theory; 
see Refs. Since P is a distribution, it is normalized: 

N+l 

E p f = L ( 2 ) 

The average fitness (/) of the population is given by: 

JV+1 

(/> = E f p f- ( 3 ) 

/=i 



B. The Evolutionary Dynamic 

The fitness distribution P does not uniquely specify 
the microscopic state of the population; that is, there 
are many microstates with the same fitness distribution. 
An essential ingredient of the statistical dynamics ap- 
proach is to assume a maximum entropy distribution of 
microstates conditioned on the macroscopic fitness distri- 
bution. Note that our approach shares a focus on fitness 
distributions and maximum entropy methods with that 
of Prugel-Bennett, Rattray, and Shapiro, Refs. j25| p7[ . 
In our case, the maximum entropy assumption entails 
that, given a fitness distribution P(t) at generation t, 
each microscopic population state with this fitness distri- 
bution is equally likely to occur. Given this assumption, 
we can construct a generation operator G that acts on 
the current fitness distribution to give the expected fitness 
distribution of the population at the next time step. (See 
P(t) ->■ G[P(t)} illustrated in Fig. |.) In the limit of infi- 
nite populations, which is similar to the thermodynamic 
limit in statistical mechanics, this operator G maps the 



current fitness distribution P(t) deterministically to the 
fitness distribution P(t + 1) at the next time step; that 
is, 

P(t + l) = G[P(t)}. (4) 

Simulations indicate that for very large populations 
(M ^ 2 L ) the dynamics on the level of fitness distri- 
butions is indeed deterministic and given by the above 
equation; thereby justifying the maximum entropy as- 
sumption in this limit. 

The operator G consists of a selection operator S and 
a mutation operator M: 

G = M ■ S. (5) 

The selection operator encodes the fitness-level effect of 
selection on the population; and the mutation operator, 
the fitness-level effect of mutation. Appendixes [A] and 
[b| review the construction of these operators for our GA 
and the Royal Staircase fitness functions. 

For now, we note that the infinite population dynam- 
ics can be obtained by iteratively applying the operator 
G to the initial fitness distribution P(0). Thus, the so- 
lutions to the macroscopic equations of motion, in the 
limit of infinite populations, are formally given by 

P(t) = G«[P(0)] . (6) 

Recalling Eq. (|l|), it is easy to see that the initial fitness 
distribution P(0) is given by: 

P„(0) = 2- K{n -^ (1 - 2- K ) , 1 < n < JV , (7) 

and 

P N+1 (0) = 2- KN . (8) 

As shown in Refs. and (3^], despite G's nonlinearity, 
it can be linearized such that the tth iterate G^ can 
be directly obtained by solving for the eigenvalues and 
eigenvectors of the linearized version G. This leads to a 
closed-form solution of the infinite-population dynamics 
specified by Eq. (^). 

C. Finite Population Sampling 

For large (M ~ 2 L ) and infinite populations the dy- 
namics of the fitness distribution is qualitatively very 
different from the behavior shown in Fig. |l|: (/) in- 
creases smoothly and monotonically to an asymptote 
over a small number of generations. That is, there are 
no epochs. The reason is that for an infinite popula- 
tion, all genotypes are present in the initial population. 
Instead of the evolutionary dynamics discovering fitter 
strings over time, it essentially only expands the propor- 
tion of globally optimal strings already present in the 
initial population at t = 0. In spite of the qualitatively 
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different dynamics for large populations, we showed in 
Ref. [§2| that the (infinite population) operator G is the 
essential ingredient for describing the finite-population 
dynamics with its epochal dynamics as well. 

There are two important differences between the 
infinite-population dynamics and that with finite popu- 
lations. The first is that with finite populations the com- 
ponents P n cannot take on continuous values between 
and 1. Since the number of individuals with fitness n in 
the population is necessarily an integer, the values of P n 
are quantized in multiples of 1/M. Thus, the space of 
allowed finite population fitness distributions turns into 
a regular lattice in N + 1 dimensions with a lattice spac- 
ing of 1/M within the simplex specified by normalization 
(Eq. (§))■ 

Second, due to the sampling of members in the finite 
population, the dynamics of the fitness distribution is no 
longer deterministic. In general, we can only determine 
the conditional probabilities Pr[Q|P] that a given fitness 
distribution P leads to another Q = (Qi, . . . , Qjv+i) m 
the next generation. 



Pr(dl?) 




1/M 



G(P) 



P* 



FIG. 2. Illustration of the stochastic dynamics involved in 
going from one generation to the next starting with finite pop- 
ulation P, moving to the next (expected) population G[P], 
and then sampling to obtain the distribution Pr[Q|P] of fi- 
nite populations at the next time. The width of the latter 
distribution is inversely proportional to the population size 
M. Note that the underlying state space is a discrete lattice 
with spacing 1/M. 

It turns out that the probabilities Pr[Q|P] are given 
by a multinomial distribution with mean G[P]: 



JV+l 



Pr[Q|P] = Ml J] 



(GniP])™" 



l rl ! 



(9) 



where Q n = m n /M, with < m n < M integers. (The 
stochastic effects of finite sampling are illustrated in Fig. 
^|.) For any finite-population fitness distribution P the 
(infinite population) operator G gives the GA's average 
dynamics over one time step, since by Eq. (^) the ex- 
pected fitness distribution at the next time step is G [P] . 
Note that the components G„[P] need not be multiples 



of 1/M. Therefore, the actual fitness distribution Q at 
the next time step is not G [P] , but is instead one of the 
allowed lattice points in the finite-population state space. 
Since the variance around the expected distribution G[P] 
is proportional to 1/M, Q tends to be one of the lattice 
points close to G[P]. 



D. Epochal Dynamics 

For finite populations, the expected change (dP) in the 
fitness distribution over one generation is given by: 



(dP) = G[P] - P. 



(10) 



Assuming that some component (dp) is much smaller 
than 1/M, the actual change in component Pi is likely to 
be dp = for a long succession of generations. That is, 
if the size of the "flow" (dp) in some direction i is much 
smaller than the lattice spacing (1/M) for the finite pop- 
ulation, we expect the fitness distribution to not change 
in direction (fitness) i. 

In Refs. 1 31 1 and |3^] we showed this is the mechanism 
by which finite populations cause epochal dynamics. For 
the Royal Staircase fitness functions, we have that when- 
ever fitness n is the highest in the population, such that 
Pi = for all i > n, the rate at which higher fitness 
strings are discovered is very small: (dp) <C 1/M for all 
i > n and population size M not too large. A period 
of stasis (an evolutionary epoch) thus corresponds to the 
time the population spends before it discovers a higher 
fitness string. More formally, each epoch n corresponds 
to the population being restricted to a region in the n- 
dimensional lower-fitness subspace consisting of fitnesses 
1 to n of the macroscopic state space. Stasis occurs be- 
cause the flow out of this subspace is much smaller than 
the finite-population induced lattice spacing. 

As the experimental runs of Fig. [y illustrated, each 
epoch in the average fitness is associated with a (typi- 
cally) constant value of the best fitness in the popula- 
tion. More detailed experiments reveal that not only is 
(/) constant on average during the epochs, in fact the en- 
tire fitness distribution P fluctuates in an approximately 
Gaussian way around some constant fitness distribution 
P n during the epoch n — the generations when n is the 
highest fitness in the population. 

As was shown in Ref. J32], each epoch fitness distri- 
bution P™ is the unique fixed point of the operator G 
restricted to the n-dimensional subspace of strings with 
1 < / < That is, if G™ is the projection of the opera- 
tor G onto the n-dimensional subspace of fitnesses from 
1 up to n, then we have: 



G"[P n ] = P" 



(11) 



By Eq. (||), then, the average fitness /„ in epoch n is 
given by: 
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/n = £j P "- (12) 

Thus, the fitness distributions P™ during epoch n are ob- 
tained by finding the fixed point of G restricted to the 
first n dimensions of the fitness distribution space. We 
will pursue this further in the next section. 

To summarize at this point, the statistical dynamics 
analysis is tantamount to the following qualitative pic- 
ture. The global dynamics can be viewed as an incre- 
mental discovery of successively more (macroscopic) di- 
mensions of the fitness distribution space. Initially, only 
strings of low fitness are present in the initial population. 
The population stabilizes on the epoch fitness distribu- 
tion P n corresponding to the best fitness n in the initial 
population. The fitness distribution fluctuates around 
the n-dimensional vector P n until a string of fitness n + 1 
is discovered and spreads through the population. The 
population then settles into (n + l)-dimensional fitness 
distribution P n+1 until a string of fitness n + 2 is discov- 
ered, and so on, until the global optimum at fitness N + l 
is found. In this way, the global dynamics can be seen 
as stochastically hopping between the different epoch dis- 
tributions P n , unfolding a new macroscopic dimension of 
the fitness distribution space each time a higher fitness 
string is discovered. 

Whenever mutation creates a string of fitness n+l, 
this string may either disappear before it spreads, seen 
as the transient jumps in best fitness in Fig. [|, or it 
may spread, leading the population to fitness distribution 
pn+i We call the latter process an innovation. Through 
an innovation, a new (macroscopic) dimension of fitness 
distribution space becomes stable. Fig. [l] also showed 
that it is possible for the population to fall from epoch 
n (say) down to epoch n — 1. This happens when, due 
to fluctuations, all individuals of fitness n are lost from 
the population. We refer to this as a destabilization of 
epoch n. Through a destabilization, a dimension can, so 
to speak, collapse. For some parameter settings, such as 
shown in Figs. [j](a) and[l](c), this is very rare. In these 
cases, the time for the GA to reach the global optimum is 
mainly determined by the time it takes to discover strings 
of fitness n + 1 in each epoch n. For other parameter 
settings, however, such as in Figs. |l|(b) and 0(d), the 
destabilizations play an important role in how the GA 
reaches the global optimum. In these regimes, destabi- 
lization must be taken into account in calculating search 
times. This is especially important in the current setting 
since, as we will show, the optimized GA often operates 
in this type of marginally stable parameter regime. 



this fixed point we linearize the generation operator by 
taking out the factor (/) , thereby defining a new operator 
G" via: 

G" = ^yG", (13) 

where (/) is the average fitness of the fitness distribution 
that G™ acts upon; see App. |A|. The operator G n is just 
an ordinary (linear) matrix operator and the quasispecies 
fitness distribution P n is nothing other than the princi- 
pal eigenvector of this matrix (normalized in probability). 
Conveniently, one can show that the principal eigenvalue 
/„ of G™ is also the average fitness of the quasispecies 
distribution. In this way, obtaining the quasispecies dis- 
tribution P n reduces to calculating the principal eigen- 
vector of the matrix G™. Again, the reader is referred to 
Ref. ||. 

The matrices G™ are generally of modest size: i.e., 
their dimension is smaller than the number of blocks N 
and substantially smaller than the dimension of geno- 
type space. Due to this we can easily obtain numerical 
solutions for the epoch fitnesses f n and the epoch quasis- 
pecies distributions P n . For a clearer understanding of 
the functional dependence of the epoch fitness distribu- 
tions on the GA's parameters, however, App. ^ recounts 
analytical approximations to the epoch fitness levels f n 
and quasispecies distributions P n developed in Ref. Q . 

The result is that the average fitness /„ in epoch n, 
which is given by the largest eigenvalue, is equal to the 
largest diagonal component of the analytical approxima- 
tion to G™ derived in App. That is, 

f n =n(l-q)^ K . (14) 

The epoch quasispecies is given by: 

pn _ (l-A)nA- 1 -' ^ n\"-*-j 

3=1 J 

where A = (1 — q) is the probability that a block will 
undergo no mutations. For the following, we are actually 
interested in the most-fit quasispecies component P™ in 
epoch n. For this component, Eq. ( |l5|) reduces to 

n— 1 p r 
pn = A n-1 -Q llLZlL. , (16) 

j=l 

where we have expressed the result in terms of of the 
epoch fitness levels fj. 



VI. QUASISPECIES DISTRIBUTIONS AND 
EPOCH FITNESS LEVELS 

During epoch n the quasispecies fitness distribution P n 
is given by a fixed point of the operator G™. To obtain 



VII. MUTATION RATE OPTIMIZATION 

In the previous sections we argued that the GA's be- 
havior can be viewed as (occasionally) stochastically hop- 
ping from epoch to epoch — when the search discovers a 
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string with increased fitness that spreads in the popula- 
tion. Assuming the total time to reach this global op- 
timum is dominated by the time the GA spends in the 
epochs, Ref. |30| developed a way to tune the mutation 
rate q such that the time the GA spends in an epoch is 
minimized. We briefly review this here before moving on 
to the more general theory that includes population-size 
effects and epoch destabilization. 

Optimizing the mutation rate amounts to finding a 
balance between two opposing effects of varying muta- 
tion rate. On the one hand, when the mutation rate is 
increased, the average number of mutations in the un- 
aligned blocks goes up thereby increasing the probability 
of creating a new aligned block. On the other hand, due 
to the increased number of deleterious mutations, the 
equilibrium proportions P™ of individuals in the highest 
fitness class during each epoch n decreases. 

In Ref. pc[] we derived an expression for the probability 
C n+ i to create, over one generation in epoch n, a string 
of fitness n + 1 that will stabilize by spreading through 
the population. This is given by 



1 



+1 



MP«P a7 r„(A) 



(17) 



where P a = (1 — A)/(2 K — 1) is the probability of align- 
ing a block (App. 0) and 7r n (A) is the probability that 
a string of fitness n + 1 will spread, as opposed to being 
lost through a fluctuation or a deleterious mutation. This 
latter probability largely depends on the relative average 
fitness difference of epoch n + 1 over epoch n. Denoting 
this difference as 



In 



fn+1 fn 
In 



1 



n , 



1. 



(18) 



and using a diffusion equation approximation (see Ref. 
H), we found: 



tt„(A) = 



1-(1-^) 2M 7 " +1 
l-(l-P^) 



+ n2Af7„ + l ' 



(19) 



If P™+i 3> this reduces to a population-size inde- 

pendent estimate of the spreading probability 

7r„«l-e- 2 T". (20) 

If one allows for changing the mutation rate between 
epochs, one would minimize the time spent in each epoch 
by maximizing C„+i. Note that C„+i depends on q only 
through A. The optimal mutation rate in each epoch n is 
determined by estimating the optimal value A D of A for 
each n. Although the optimal A Q can be determined as 
the solution of an algebraic equation, we found in Ref. 
[P0|| that it is well approximated by 



A (n) » 1 



1 



3n 1.175 ■ 

For large n this gave the optimal mutation rate as 



(21) 



SKn 1 - 175 



,n > 1 



(22) 



Thus, the optimal mutation rate drops as a power-law 
in both n and K. This implies that if one is allowed to 
adapt the mutation rate during the run, the mutation 
rate should decrease as a GA run progresses so that the 
search will find the global optimum as quickly as pos- 
sible. We pursue this idea more precisely elsewhere by 
considering an adaptive mutation rate scheme for a GA. 

We now turn to the simpler problem of optimizing 
mutation rate for the case of a constant mutation rate 
throughout a GA run. In Ref. ||(]] we used Eq. ( |l7|) to es- 
timate the total number E of fitness function evaluations 
the GA uses on average before an optimal string of fitness 
N + 1 is found. As a first approximation, we assumed 
that the GA visits all epochs, that the time spent in in- 
novations between them is negligible, and that epochs 
are always stable. By epoch stability we that mean it is 
highly unlikely that strings with the current highest fit- 
ness will disappear from the population through a fluctu- 
ation, once such strings have spread. These assumptions 
appear to hold for the parameters of Figs. 0(a) and 0(c). 
They may hold even for the parameters of Fig. 0(b), but 
they most likely do not for Fig. 0(d). For the parame- 
ters of Fig. 0(d), we see that the later epochs (n = 9, 
and 10) easily destabilize a number of times before the 
global optimum is found. Although we will develop a 
generalization that addresses this more complicated be- 
havior in the next sections, it is useful to work through 
the optimization of mutation rate first. 

The average number T„ of generations that the popu- 
lation spends in epoch n is simply l/C„+i, the inverse of 
the probability that a string of fitness n+1 will be discov- 
ered and spread through the population. For a popula- 
tion of size M, the number of fitness function evaluations 
per generation is M, so that the total number E n of fit- 
ness function evaluations in epoch n is given by MT n . 
More explicitly, we have: 



(23) 



That is, the total number of fitness function evaluations 
in each epoch is independent of the population size M. 
This is due to two facts, given our approximations. First, 
the epoch lengths, measured in generations, are inversely 
proportional to M , while the number of fitness function 
evaluations per generation is M. Second, since for stable 
epochs P," ^> 1/M, the probability 7r„ is also indepen- 
dent of population size M; recall Eq. (|2(i|). 

The total number of fitness function evaluations E(X) 
to reach the global optimum is simply given by substitut- 
ing into Eq. (|23|) our analytical expressions for P™ and 
ir n , Eqs. ( jl6| ) and (po|), and then summing E n {\) over 
all epochs n from 1 to N . We found that: 



N 



E(X) = £ 



1 



^ P Q 7r„(A) L\ n\ n 

n=l v ' i—l 



n 



n\ T ' 



(24) 







Note that in the above equation ttn — 1 by definition 
because the algorithm terminates as soon as a string of 
fitness N + 1 is found. That is, strings of fitness N + 1 
need not spread through the population. The optimal 
mutation rate for an entire run is obtained by minimiz- 
ing Eq. ( [24]) with respect to A. 
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FIG. 3. Average total number (E) of fitness function eval- 
uations as a function of mutation rate q, from the theory 
(dashed), Eq. (|24[), and from experimental estimates (solid). 
The fitness function parameter settings are N = 4 blocks of 
length K — 10 bits. The mutation rate runs from q = 0.001 to 
q — 0.028. Experimental data points are estimates over 250 
runs. The experimental curves show four different population 
sizes; M = 80, M = 140, M = 230, and M = 380. 

Figure ||| shows for N = 4 blocks of length K = 10 
bits the dependence of the average total number E{q) of 
fitness function evaluations on the mutation rate q. The 
dashed line is the theoretical prediction of Eq. (|24|) ; while 
the solid lines show the experimentally estimated values 
of (E) for four different population sizes. Each experi- 
mental data point is an estimate obtained from 250 GA 
runs. Figure |3| illustrates in a compact form the findings 
of Ref. |30| |, which can be summarized as follows. 

1. At fixed population size M, there is a smooth cost 
function E{q) as a function of mutation rate q. It 
has a single and shallow minimum q a , which is ac- 
curately predicted by the theory. 

2. The curve E(q) is everywhere concave. 

3. The theory slightly underestimates the experimen- 
tally obtained (E). 

4. The optimal mutation rate q roughly occurs in the 
regime where the highest epochs are marginally sta- 
ble; see Fig. [|. 

5. For mutation rates lower than q a the experimen- 
tally estimated total number of fitness function 
evaluations (E) grows steadily and becomes almost 
independent of the population size M. (This is 
where the experimental curves in Fig. |^ overlap.) 
For mutation rates larger than q a the total number 
of fitness function evaluations does depend on M, 
which is not explained by the theory of Ref. |30| . 



6. There is mutational error threshold in q that 
bounds the upper limit in q of the GA's effi- 
cient search regime. Above the threshold, which 
is population-size independent, suboptimal strings 
of fitness N cannot stabilize in the population, even 
for very large population sizes. This error threshold 
is also correctly predicted by the theory. It occurs 
around q c = 0.028 for N = 4 and K = 10. 



VIII. EPOCH DESTABILIZATION: 
POPULATION-SIZE DEPENDENCE 



We now extend the above analysis to account for E's 
dependence on population size. This not only improves 
the parameter-optimization theory, but also leads us to 
consider a number of issues and mechanisms that shed 
additional light on how GAs work near their optimal pa- 
rameter settings. Since it appears that optimal param- 
eter settings often lead the GA to run in a behavioral 
regime were the population dynamics is marginally sta- 
ble in the higher epochs, we consider how destabilization 
dynamics affects the time to discover the global optimum. 

We saw in Figs. |l](b) and |l](d) that, around the opti- 
mal parameter settings, the best fitness in the population 
can show intermittent behavior. Apparently, fluctuations 
sometimes cause an epoch's current best strings (of fit- 
ness n) in the population to disappear. The best fitness 
then drops to n — 1. Often, strings of fitness n are re- 
discovered later on. Qualitatively, what happens during 
these destabilizations is that, since the proportion P™ of 
individuals in the highest fitness class decreases for in- 
creasing n and q (Eq. (jl6|)), for small population sizes 
the absolute number of individuals in the highest fitness 
class approaches a single string; i.e., MP™ 1 in higher 
epochs. When this happens, it is likely that all individu- 
als of fitness n are lost through a deleterious fluctuation 
and the population falls back onto epoch n— 1. Somewhat 
more precisely, whenever the standard deviation of fluc- 
tuations in the proportion P n of individuals with fitness 
n becomes as small as their equilibrium proportion P™, 
destabilizations start to become a common occurrence. 

Since the probability of a destabilization is sensitive to 
the population size M , this dynamical effect introduces 
population-size dependence in the average total number 
(E) of fitness function evaluations. 

As we just noted, the theory for E(q) used in Ref. 
fiof assumed that all epochs are stable, leading to a 
population-size independent theory. However, as is 
clear from Fig. |l|(d), one should take into account the 
(population-size dependent) probability of epoch n desta- 
bilizing several times to epoch n— 1 before the population 
moves to epoch n + 1. For example, if during epoch n 
the population is 3 times as likely to destabilize to epoch 
n — 1 compared to innovating to epoch n + 1, then we 
expect epoch n to disappear three times before moving 
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to epoch n+1. Assuming that epoch n~ 1 is stable, this 
increases the number of generations spent in epoch n by 
roughly three times the average number of generations 
spent in epoch n — 1. 

To make these ideas precise we introduce a Markov 
chain model to describe the "hopping" up and down be- 
tween the epochs. The Markov chain has N + 1 states, 
each representing an epoch. In every generation there 
are probabilities p\\ to innovate from epoch n to epoch 
n+1 and p~ to destabilize, falling from epoch n to epoch 
n— 1. The globally optimum state N + 1 is an absorbing 
state. Starting from epoch 1 we calculate the expected 
number T of generations for the population to reach the 
absorbing state for the first time. 

The innovation probabilities p^ are just given by the 
C n+1 of Eq. (Ill): 



Pn 



a 



n+1 



M 

E n ' 



(25) 



where E n is given by the approximation of Eq. (|23|). Note 
that when MP™ approaches 1 the spreading probability 
7r„ , as given by Eq. (|l^) , becomes population-size depen- 
dent as well, and we use Eq. ( |l9| ) rather than Eq. (20). 
To obtain the destabilization probabilities p~ we assume 
that in each generation the population has an equal and 
independent probability to destabilize to epoch n — 1. 
This probability is given by the inverse of the average 
time until a destabilization occurs. 

In Ref. |32| we studied the destabilization mechanism 
using a diffusion equation method. We derived an analyt- 
ical approximation for the average number of generations 
D n until epoch n destabilizes and falls back onto epoch 
n — 1. The result is: 



D, 



I - pn 



2//„ 



iMjlnPl 

1 - p™ 



erf 



I MfinO— PS) 



(26) 



where erf(ir) is the error function and erfi(x) = erf(ix)/i 
is the imaginary error function. 

In Ref. |3^] we pointed out the connection between 
the above formula and error thresholds in the theory of 
molecular evolution. Generally, error thresholds denote 
the boundary in parameter space between a regime where 
a certain high fitness string, or an equivalence class of 
high fitness strings, is stable in the population, and a 
regime where it is unstable. In the case of a single high 
fitness "master sequence" on e sp eaks of a genotypic er- 
ror threshold: see Refs. HgHliJ. In the case of an 
equivalence class of high fitness strings, one speaks of a 
phenotypic error threshold; see Refs. fl^j2^| . 

A sharply defined error threshold generally only oc- 
curs in the limit of infinite populations and infinite string 
length |L9| , but extensions to finite population cases have 
been studied in Refs. @JH,|§. In Ref. [§8|, for exam- 
ple, the occurrence of a finite-population phenotypic er- 
ror threshold was defined by the equality of the standard 



deviation and the mean of the number of individuals of 
the highest fitness class. This definition is in accord with 
Eq. @: the argument of erfi(x), v /Af^„P"/(l - P"), 
is exactly the ratio between the mean proportion P£ 
and standard deviation of the number of individuals 
with fitness n, as derived in Ref. |32| . The function 
crfi(ir) is a very rapidly growing function of its argu- 
ment: erfi(x) w exp(x 2 ) / x for x larger than 1. Therefore, 
yjMu n P™ / (1 — P") being either smaller (larger) than 1 
is a reasonable criterion for the instability (stability) of 
an epoch. Of course, this is simply a way of summarizing 
the more detailed information contained in Eq. (^6|). 

The constant \x n in Eq. ( |2^ ) is the average decay rate 
of fluctuations in the number of individuals in the highest 
fitness class around its equilibrium value P™. The value 
of \x n for epoch n can be calculated in terms of the relative 
sizes of the fluctuations in the directions of all lower- lying 
epochs. This calculation was performed explicitly in Ref. 
||32f . Formally, one needs to rotate the covariance matrix 
of sampling fluctuations during epoch n to the basis of 
epoch eigenvectors P % . The covariance matrix of sam- 
pling fluctuations during epoch n is approximately given 
by: 



(dPidPj) = 



M 



(27) 



Defining the matrix R such that its columns contain the 
epoch distributions P J : 



R; 



(28) 



we can rotate the covariance matrix to the basis of epoch 
vectors by using the inverse of R. The vector B con- 
tains the diagonal components of this rotated covariance 
matrix: 



_^ Tl — ± 



(29) 



k,m— 1 



The components Bi are proportional to the amplitude of 
fluctuations in the direction of epoch i during epoch n. 
The decay rate of fluctuations in the direction of epoch 
i is given by (/„ — fi)/fn- The decay rate fi n is then 
simply given by the average decay rates of fluctuations 
in the directions of the lower lying epochs, weighted by 
the sampling fluctuations vector B. That is, 



fin 



fi)Bi 



fn Si=l Bi 



(30) 



Generally, n n decreases monotonically as a function of n 
since fluctuations in the proportion P™ of individuals in 
the highest fitness class n decay more slowly for higher 
epochs. 

Thus, we have for the destabilization probabilities: 
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Pn = 



(31) 



Finally, note that the probability to remain in epoch n is 

1-Pn ~Pn- 

With these expressions for the transition probabilities 
of the Markov chain, it is now straightforward to calcu- 
late the average number T of generations before the GA 
discovers the global optimum for the first time; see for 
instance Sec. 7.4 of Ref. (l2|. The solution is: 



T = 
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where d>„ is defined as: 



and 



fc=2 P fe 



!>1 = 1. 



(32) 
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(34) 



Since Eq. (|32|) gives the average number T of genera- 
tions, the average number of fitness function evaluations 
E(q, M) is given by: 



E(q, M) = MT 

— En + En- 
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(35) 



where E n is given by Eq. (g3J) and where the last equal- 
ity is obtained by rewriting the sums in Eq. (|32|). It 
is easy to see that as epochs become arbitrarily stable 
(D n — > oo) this solution reduces to Eq. (pi). 



IX. THEORY VERSUS EXPERIMENT 

We can now compare this population-size dependent 
approximation, Eq. (35), with the experimentally mea- 



sured dependence on M of the average total number (E) 
of fitness function evaluations. Figure^ shows the depen- 
dence of (E) on the population size M for two different 
parameter settings of N and K and for a set of mutation 
rates q. 

The upper figures, Figs. f|(a) and (|(c), give the depen- 
dence of the experimentally estimated (E) on the pop- 
ulation size M. The lower figures, Figs. |i|(b) and [|(d), 
give the theoretical predictions from Eq. (Bq). The up- 
per left figure, Fig. |(a), shows (E) as a function of 



M for N = 4 blocks of length K = 10 for four differ- 
ent mutation rates: q E {0.013,0.015,0.017,0.019}. The 
population size ranges from M = 50 to M = 320. The to- 
tal number of fitness function evaluations on the vertical 
axis ranges from (E) = to (E) = 15 x 10 5 . Each data 
point was obtained as an average over 250 GA runs. Fig- 
ure [|(b) shows the theoretical predictions for the same 
parameter settings. Figure ^(c) gives the experimental 
estimates for N = 6 blocks of length K = 6, over the 
range M — 30 to M — 300, for four mutation rates: 
q e {0.018,0.02,0.022,0.024}. The total number of fit- 
ness function evaluations on the vertical axis range from 
(E) = to (E) = 7 x 10 5 . Figure |(d) shows the theo- 
retical predictions for the same range of M and the same 
four mutation rates. 

We see that as the population size becomes "too small" 
destabilizations make the total number of fitness func- 
tion evaluations increase rapidly. The higher the mu- 
tation rate, the higher the population size at which the 
sharp increase in (E) occurs. These qualitative effects are 
captured accurately by the theoretical predictions from 
Eq. ( p5| ) . Although our analysis involves several approx- 
imations (e.g. as in the calculations of D n ), the theory 
does quantitatively capture the population-size depen- 
dence well, both with respect to the predicted absolute 
number of fitness function evaluations and the shape of 
the curves as a function of M for the different mutation 
rates. From Figs. f|(c) an d ||(d) it seems that the theory 
overestimates the growth of (E) for the larger mutation 
rates as the population size decreases. Still, the theory 
correctly captures the sharp increase of (E) around a 
population size of M — 50. 

As the population size increases beyond approximately 
M = 200, we find experimentally that the average total 
number of fitness function evaluations (E) starts rising 
slowly as a function of M. This effect is not captured by 
our analysis. It is also barely discernible in Figs. |J(a) 
and |^(c). We believe that the slow increase of (E) for 
large population sizes derives from two sources. 

First, by the maximum entropy assumption, our theory 
assumes that all individuals in the highest fitness class are 
genetically independent, apart from the sharing of their 
aligned blocks. This is not true in general. The sampling 
of the selection process introduces genetic correlations in 
the individuals of the highest fitness class. Under our as- 
sumption of independence, doubling the population size 
from M to 2M should reduce the number of generations 
to find the global optimum by an equal factor of 2, mak- 
ing (E) independent of M. In reality, since the strings in 
the highest fitness class are correlated, doubling the pop- 
ulation size from M to 2M will reduce the total number of 
generations by a factor somewhat less than two, thereby 
increasing (E) slightly. Unfortunately, this effect is very 
hard to address quantitatively. 
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FIG. 4. Average total number (E) of fitness function evaluations as a function of the population size M for two different 
fitness function parameters and four mutation rates each, both experimentally ((a) and (c), top row) and theoretically ((b) 
and (d), bottom row). In each figure each solid line gives E(M) for a different mutation rate. Each experimental data point 
is an average over 250 GA runs. Figures (a) and (b) have N = 4 blocks of length K = 10. The upper figure (a) shows the 
experimentally estimated E(M) as a function of M for the mutation rates q € {0.013, 0.015, 0.017, 0.019}. The lower figure (b) 
shows the theoretical results, as given by Eq. (^5|), for the same parameter settings. In both, the population size ranges from 
M — 50 to M = 320 on the horizontal axis. Figures (c) and (d) have N = 6 blocks of length K = 6. Figure (c) shows the 
experimental averages and figure (d) the theoretical predictions for the same parameter settings. The population sizes on the 
horizontal axis run from M = 30 to M = 300. The mutation rates shown in (c) and (d) are q £ {0.018, 0.02, 0.022, 0.024}. 



The second reason for the increase of E with increas- 
ing population size comes from the time the population 
spends in the short innovations between the different 
epochs. Up to now, we have neglected these innovation 
periods. Generally, they only contribute marginally to 
E. In Ref. (3^] we calculated the approximate number of 
generations g n that the population spends in the innova- 
tion from epoch n to epoch n + 1 and found that: 

(36) 

In 

where 7 n is the fitness differential given by Eq. (|l^) . The 
GA thus expends a total of 

I = Mlog[A/]]T±±^, (37) 

n—1 

fitness function evaluations in the innovations. Notice 
that this number grows as M log [M] . Since the terms in 
the above sum are generally much smaller than E n , the 
contribution of I only leads to a slow increase in (E) as 
M increases. 



X. SEARCH-EFFORT SURFACE AND 
GENERALIZED ERROR THRESHOLDS 



We summarize our theoretical and experimental find- 
ings for the entire search-effort surface E(q, M) of the 
average total number of fitness function evaluations in 
Fig. § 

The figure shows the average total number E(q, M) of 
fitness function evaluations for N = 4 blocks of length 
K = 10 bits; the same fitness function as used in Figs. 
0(c)) S |( a )> and 1(b)- The top plot shows the theoret- 
ical predictions, which now include the innovation time 
correction from Eq. (|37|); the bottom, the experimental 
estimates. The horizontal axis ranges from a population 
size of M — 1 [M — 20, experimental) to a population 
size of M = 380 with steps of AM = 1 (AM = 30, ex- 
perimental) . The vertical axis runs from a mutation rate 
of q = 0.001 to q = 0.029 with steps of Aq = 0.00025 
in the theoretical plot and Aq = 0.002 in the experi- 
mental. The experimental search-effort surface is thus 
an interpolation between 195 data points on an equally 
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spaced lattice of parameter settings. Each experimental 
data point is an average over 250 GA runs. The contours 
range from E(q, M) = to E(q, M) = 2 x 10 6 with each 
contour representing a multiple of 10 5 . Note that the 
lowest values of E lie between 10 5 and 2 x 10 5 . Lighter 
gray scale corresponds to smaller values of E(q, M). 

The initial observations from Fig. [| were already ap- 
parent from Fig. || and Fig. [|. First, the theory cor- 
rectly predicts the relatively large region in parameter 
space where the GA searches most efficiently. Second, 
the theory correctly predicts the location of the optimal 
parameter settings, indicated by a dot in the upper plot 
of Fig. ^|. The optimum occurs for somewhat higher 
population size in the experiments, as indicated by the 
dot in the lower plot of Fig. ||. Due to the large vari- 
ance in E from run to run (recall Table ||) and the rather 
small differences in the experimental values of (E) near 
this regime, however, it is hard to infer from the exper- 
imental data exactly where the optimal population size 
occurs. Third, the theory underestimates the absolute 
magnitude of E(q,M) somewhat. Fourth, at small mu- 
tation rates E(q, M) increases more slowly for decreasing 
q in the theoretical plot than in the experimental plot. 
Apart from this, though, the plots illustrate the general 
shape of the search-effort surface E(q, M). 

There is a relatively large area of parameter space 
around the optimal setting (q a , M a ) for which the GA 
runs efficiently. Moving away from this optimal setting 
horizontally (changing M) increases E(q, M) only slowly 
at first. For decreasing M one reaches a "wall" relatively 
quickly around M = 30. For population sizes lower than 
M = 30, the higher epochs become so dynamically un- 
stable that it is difficult for the population to reach the 
global optimum string at all. In contrast, moving in the 
opposite direction, increasing population size, E(q, M) 
increases slowly over a relatively large range of M. Thus, 
choosing the population size too small is far more dele- 
terious than setting it too large. 

Moving away from the optimal setting vertically 
(changing q) the increase of E(q, M) is also slow at first. 
Eventually, as the plots make clear, increasing q one 
reaches the same "wall" as encountered in lowering M. 
This occurs at q » 0.02 in Fig. ||. For larger mutation 
rates the higher epochs become too unstable in this case 
as well, and the population is barely able to reach the 
global optimum. 

The wall in (q, M)-space is the two-dimensional ana- 
logue of a phenomenon known as the error threshold in 
the t heory of molecular evolution. As pointed out in Sec. 
VIII , in our case error thresholds form the boundary be- 
tween parameter regimes where epochs arc stable and 
unstable. Here, the error boundary delimits a regime in 
parameter space where the optimum is discovered rela- 
tively quickly from a regime, in the black upper left cor- 
ners of the plots, where the population essentially never 
finds the optimum. For too high mutation rates or too 
low population sizes, selection is not strong enough to 
maintain high fitness strings — in our case those close to 



the global optimum — in the population against sampling 
fluctuations and deleterious mutations. Strings of fitness 
N will not stabilize in the population but will almost 
always be immediately lost, making the discovery of the 
global optimum string of fitness N+l extremely unlikely. 
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FIG. 5. Contour plots of the search-effort surface E(q, M) 
of the average total number of fitness function evaluations 
for the theory (upper), Eqs. ([^ and (^), and for experi- 
mental estimates (lower). The parameter settings are N — 4 
blocks of length K — 10 bits. The population size M runs 
from M = 1 to M = 380 on the horizontal axis on the up- 
per plot and from M = 20 to M = 380 on the lower. The 
mutation rate runs from q = 0.001 to q = 0.029 on the ver- 
tical. The contours are plotted over the range E(q, M) — 
to E(q, M) = 2 x 10 6 with a contour at each multiple of 
10 5 . The experimental surface was interpolated from 195 
equally spaced data points, 13 increments of AM = 30 on 
the horizontal axis by 15 increments of Aq — 0.002 on the 
vertical. The theoretical surface was interpolated over a grid 
using AM — 1 and Aq = 0.00025. The optimal theoretical 
parameter setting, (q ,M ) = (0.011,60), and the optimal 
experimental parameter setting, (q ,M ) = (0.011,140), are 
marked in their respective plots with a dot. 
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Note that the error boundary rolls over with increas- 
ing M in the upper left corner of the plots. It bends all 
the way over to the right, eventually running horizon- 
tally, thereby determining a population-size independent 
error threshold. For our parameter settings this occurs 
around q « 0.028. Thus, beyond a critical mutation rate 
of q c 0.028 the population almost never discovers the 
global optimum, even for very large populations. 

The value of this horizontal asymptote q c can be 
roughly approximated by calculating for which mutation 
rate q c epoch N has exactly the same average fitness as 
epoch N — 1; i.e. find q c such that /jv ~ Jn-i- For those 
parameters, the population is under no selective pressure 
to move from epoch N — 1 to epoch TV. Thus, strings of 
fitness N will generally not spread in the population. Us- 
ing our analytic approximations, we find that the critical 
mutation rate q c is simply given by: 

&=i-VnF- (38) 

For the parameters of Fig. || this gives q c — 0.0284. This 
asymptote is indicated there by the horizontal line in the 
top plot. 

Similarly, below a critical population size M Cl it is also 
practically impossible to reach the global optimum, even 
for low mutation rates. This M c can also be roughly ap- 
proximated by calculating the population size for which 
the sampling noise is equal to the fitness differential be- 
tween the last two epochs. We find: 

M c = ( — ^— i ^ . (39) 

\N\-N+l) v ' 

For the parameters of Fig. [| this gives M c w 27 around 
q — 0.011. This threshold estimate is indicated by the 
vertical line in Fig. |^. 

Further, notice that for small mutation rates, at the 
bottom of each plot in Fig. |E|, the contours run almost 
horizontally. That is, for small mutation rates relative to 
the optimum mutation rate q Q , the total number of fitness 
function evaluations E(q, M) is insensitive to the popula- 
tion size M. Decreasing the mutation rate too far below 
the optimum rate increases E(q, M) quite rapidly. Ac- 
cording to our theoretical predictions it increases roughly 
as 1/q with decreasing q. The experimental data indicate 
that this is a slight underestimation. In fact, E(q,M) 
appears to increase as l/q a where the exponent a lies 
somewhere between 1 and 2. 

Globally, the theoretical analysis and empirical ev- 
idence indicate that the search-effort surface E(q, M) 
is everywhere concave. That is, for any two points 
(qi,Mi) and (q2,M 2 ), the straight line connecting these 
two points is everywhere above the surface E(q, M). We 
believe that this is always the case for mutation-only ge- 
netic algorithms with a static fitness function that has 
a unique global optimum. This feature is useful in the 
sense that a steepest descent algorithm on the level of the 



GA parameters q and M will always lead to the unique 
optimum (q ,M ). 

Finally, it is important to emphasize once more that 
there are large run-to-run fluctuations in the total num- 
ber of fitness evaluations to reach the global optimum. 
(Recall Table ffl.) Theoretically, each epoch has an expo- 
nentially distributed length since there is an equal and 
independent innovation probability of leaving it at each 
generation. The standard deviation of an exponential 
distribution is equal to its mean. Since the total time 
E(q, M) is dominated by the last epochs, the total time 
E(q, M) has a standard deviation close to its mean. 

One conclusion from this is that, if one is only going 
to use a GA for a few runs on a specific problem, there 
is a large range in parameter space for which the GA's 
performance is statistically equivalent. In this sense, fluc- 
tuations lead to a large "sweet spot" of GA parameters. 
On the other hand, these large fluctuations reflect the 
fact that individual GA runs do not reliably discover the 
global optimum within a fixed number of fitness function 
evaluations. 



XI. CONCLUSIONS 

We derived explicit analytical approximations to the 
total number of fitness function evaluations that a GA 
takes on average to discover the global optimum as a 
function of both mutation rate and population size. The 
class of fitness functions so analyzed describes a very gen- 
eral subbasin-portal architecture in genotype space. We 
found that the GA's dynamics consists of alternating pe- 
riods of stasis (epochs) in the fitness distribution of the 
population, with short bursts of change (innovations) to 
higher average fitness. During the epochs the most-fit in- 
dividuals in the population diffuse over neutral networks 
of isofitness strings until a portal to a network of higher 
fitness is discovered. Then descendants of this higher 
fitness string spread through the population. 

The time to discover these portals depends both on the 
fraction of the population that is located on the highest 
neutral net in equilibrium and the speed at which these 
population members diffuse over the network. Although 
increasing the mutation rate increases the diffusion rate 
of individuals on the highest neutral network, it also in- 
creases the rate of deleterious mutations that cause these 
members "fall off" the highest fitness network. The mu- 
tation rate is optimized when these two effects are bal- 
anced so as to maximize the total amount of explored 
volume on the neutral network per generation. The op- 
timal mutation rate, as given by Eq. (p2|), is dependent 
on the neutrality degree (the local branching rate) of the 
highest fitness network and on the fitnesses of the lower 
lying neutral networks onto which the mutants are likely 
to fall. 

With respect to optimizing population size, we found 
that the optimal population size occurs when the highest 
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epochs are just barely stable. That is, given the opti- 
mal mutation rate, the population size should be tuned 
such that only a few individuals are located on the high- 
est fitness neutral network. The population size should 
be large enough such that it is relatively unlikely that 
all the individuals disappear through a deleterious fluc- 
tuation, but not much larger than that. In particular, 
if the population is much larger, so that many individu- 
als are located on the highest fitness network, then the 
sampling dynamics causes these individuals to correlate 
genetically. Due to this genetic correlation, individuals 
on the highest fitness net do not independently explore 
the neutral network. This leads, in turn, to a deteri- 
oration of the search algorithm's efficiency. Therefore, 
the population size should be as low as possible without 
completely destabilizing the last epochs. Given this, one 
cannot help but wonder how general the association of 
efficient search and marginal stability is. 

It would appear that the GA wastes computational re- 
sources when maintaining a population quasispecies that 
contains many suboptimal fitness members; that is, those 
that are not likely to discover higher fitness strings. This 
is precisely the reason that the GA performs so much 
more poorly than a simple hill climbing algorithm on 
this particular set of fitness functions, as shown in Ref. 
plf . The deleterious mutations together with the na- 
ture of the selection mechanism drives up the fraction of 
lower fitness individuals in the quasispecies. If we allowed 
ourselves to tune the selection strength, we could have 
tuned selection so high that only the most-fit individuals 
would ever be present. As we will show elsewhere, this 
leads to markedly better performance, equal to or even 
slightly exceeding that of hill climbing algorithms. Thus, 
the GA's comparatively poor performance is the result 
of resources being wasted on the presence of suboptimal 
fitness individuals. 

In contrast, the reason that only the best individuals 
must be kept for optimal search is a result of the fact that 
our set of fitness functions has no local optima. If there 
are small fluctuations in fitness on the neutral networks 
or if there is noise in the fitness function evaluation, it 
might be beneficial to keep some of the lower fitness in- 
dividuals in the population. We will also pursue this 
elsewhere. 

For now, it suffices to recall once more the typical dy- 
namical behavior of the GA population around the opti- 
mal parameter settings. The GA searches most efficiently 
when population size and mutation rate are set such that 
the epochs are marginally stable. That is, the GA dy- 
namics is as "stochastic" as possible without destabiliz- 
ing the current and later epochs. Strings of fitness n 
are (only slightly) preferentially reproduced over strings 
with fitness n — 1, and the population size is just large 
enough to protect these fitness n strings from deleterious 
sampling fluctuations. 

More precisely, mutation rate, population size, and 
network neutralities set a lower bound Sf of fitness dif- 
ferentials that can be "noticed" by the selection mech- 



anism. This idea is closely related to so called "ne arly 
neutral" theories of molecular evolution of Refs. |p3| , p4f . 
For optimal parameter settings, the fitness differential 
Sf = n — (n — 1) = 1 is just barely detected by se- 
lection. Imagine that during epoch n there are strings 
which, given an additional K bits set correctly, obtain a 
fitness / + Sf, instead of / + 1. As a function of n, K, q 
and M we can roughly determine the minimal fitness dif- 
ferential Sf for these strings to be preferentially selected. 
We find that 

i/i (T^F(^7 + 1 - (1 -' ) ")- (40 » 

Below this fitness differential, strings of fitness f + Sf are 
effectively neutral with respect to strings with fitness n. 
The net result is that the parameters of the search, such 
as q and M, determine a coarse graining of fitness levels 
where strings in the band of fitness between n and n + Sf 
are treated as having equal fitness. 

In future work we explore how this coarse graining 
can be turned to good use by a GA for fitness functions 
that possess many shallow local optima — optima that on 
a coarser scale disappear so that the resulting coarse- 
grained fitness function induces a neutral network archi- 
tecture like that explored here. Intuitively, it should be 
possible to tune GA parameters so that local optima dis- 
appear below the minimal fitness differentials Sf and so 
that the GA efficiently searches the coarse-grained land- 
scape without becoming pinned to local optima. 
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APPENDIX A: SELECTION OPERATOR 

Since the GA uses fitness-proportionate selection, the 
proportion Pf of strings with fitness i after selection is 
proportional to i and to the fraction Pi of strings with fit- 
ness i before selection; that is, Pf = ci Pi. The constant 
c can be determined by demanding that the distribution 
remains normalized. Since the average fitness (/) of the 
population is given by Eq. (||), we have Pf = iPi/(f). 
In this way, we define a (diagonal) operator S that works 
on a fitness distribution P and produces the fitness dis- 
tribution P 8 after selection by: 

("A-g^h- (ai) 

Notice that this operator is nonlinear since the average 
fitness (/) is a function of the distribution P on which 
the operator acts. 

APPENDIX B: MUTATION OPERATOR 

The component M,j of the mutation operator gives 
the probability that a string of fitness j is turned into a 
string with fitness i under mutation. 

First, consider the components My with i < j. These 
strings are obtained if mutation leaves the first i—1 blocks 
of the string unaltered and disrupts the ith block in the 
string. Multiplying the probabilities that the preceding 
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2—1 blocks remain aligned and that the ith block be- 
comes unaligned we have: 



Mii = (i - qf~ 1)K (i - (i - q) K ) , i<j 



(Bl) 



The diagonal components Mjj are obtained when mu- 
tation leaves the first j — 1 blocks unaltered and does 
not mutate the jth block to be aligned. The maximum 
entropy assumption says that the jth block is random 
and so the probability P a that mutation will change the 
unaligned jth block to an aligned block is given by: 



Pa = 



!-(!-<?) 



K 



)K 



1 



(B2) 



This is the probability that at least one mutation will 
occur in the block times the probability that the mu- 
tated block will be in the correct configuration. Thus, 
the diagonal components are given by: 



Mtf = (1 - q)W(l - P a ). 



(B3) 



Finally, we calculate the probabilities for increasing- 
fitness transitions My with i > j. These transition prob- 
abilities depend on the states of the unaligned blocks j — 1 
through i. Under the maximum entropy assumption all 
these blocks are random. The jth block is equally likely 
to be in any of 2 K — 1 unaligned configurations. All suc- 
ceeding blocks are equally likely to be in any one of the 
2 K configurations, including the aligned one. In order for 
a transition to occur from state j to i, all the first j — 1 
aligned blocks have to remain aligned, then the jth block 
has to become aligned through the mutation. The latter 
has probability P a . Furthermore, the following i — j — 1 
blocks all have to be aligned. Finally, the ith block has 
to be unaligned. Putting these together, we find that: 



(1 



1 



q)^ 1)K Pa 
' 1 



1 



i > j 



(B4) 



The last factor does not appear for the special case of the 
global optimum, i = A^ + 1, since there is no (N + l)st 
block. 



APPENDIX C: EPOCH FITNESSES AND 
QUASISPECIES 

The generation operator G is given by the product of 
the mutation and selection operators derived above; i.e. 
G = M ■ S. The operators G n are defined as the projec- 
tion of the operator G onto the first n dimensions of the 
fitness distribution space. Formally: 



G?[P] = G i [P], i<n, 



(CI) 



and, of course, the components with i > n are zero. 

The epoch quasispecies P n is a fixed point of the op- 
erator G". As in Sec. VI, we take out the factor (/) 
to obtain the matrix G™ . The epoch quasispecies is now 



simply the principal eigenvector of the matrix G" and 
this can be easily obtained numerically. 

However, in order to obtain analytically the form of 
the quasispecies distribution P n during epoch n we ap- 
proximate the matrix G". As shown in App. |^, the 
components My (and so of G") naturally fall into three 
categories. Those with i < j, those with i > j, and 
those on the diagonal i = j. Components with i > j 
involve at least one block becoming aligned through mu- 
tation. These terms are generally much smaller than the 
terms that only involve the destruction of aligned blocks 
or for which there is no change in the blocks. We there- 
fore approximate G" by neglecting terms proportional to 
the rate of aligned-block creation — what was called P a in 
App. Under this approximation for the components 
of G n , we have: 



3(l- q y^ K (l-(l-q) K ), i<j 



and 



G?,=j(l-«) 



(C2) 



(C3) 



The components with i > j are now all zero. 

Note first that all components of G" only depend on 
q and K through A = (1 — q) K , the probability that an 
aligned block is not destroyed by mutation. Note further 
that in this approximation G™ is upper triangular. As is 
well known in matrix theory, the eigenvalues of an upper 
triangular matrix are given by its diagonal components. 
Therefore, the average fitness f n in epoch n, which is 
given by the largest eigenvalue, is equal to the largest 
diagonal component G™. That is, 



/„ = n(l - q) (n - 1)K = n\ Ti 



(C4) 



The principal eigenvector P n is the solution of the 
equation: 



n 

£(G*.-/ n( %)p; = o 



(C5) 



Since the components of G" depend on A in such a sim- 
ple way, we can analytically solve for this eigenvector; 
finding that the quasispecies components are given by: 



P" 



(1 - X)n\ r ' 



n\ r ' 



n 



nX n ~ j - j 
n A«-i-j - j 



For the component P™ this reduces to 
n-i 



n 11 nX n-l- 



3 = 1 



3 



(C6) 



(C7) 



The above equation can be re-expressed in terms of the 
epoch fitness levels ff. 



k - a- 1 n 



fn fj 
fn — A/, 



(C8) 
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